US Crime Rate Map by State (Violent & Property)

Author

Ziwen Lu, Xuyan Xiu, Doris Yan

To better understand the data, we used package plotly to map the average violent and property crime rates across the U.S. from 2009 to 2014, displayed on a state-by-state basis.

Data wrangling

options("icpsr_email" = "dy212@georgetown.edu", 
        "icpsr_password" = "Fbp9nbmKreLsubd8nL5H")
# takes 1-2 mins

icpsr_download(
  file_id = 38649,
  download_dir = here("data")
)

crime <- 
  read_dta(here("data","ICPSR_38649",
                "DS0001",
                "38649-0001-Data.dta"))

Data orgnizing

We adjusted the data format, including converting the case of variable names and editing labels, and calculated the number of crimes per 100,000 people in each location.

\[ Rate = \left( \frac{Crime}{Population} \right)10 ^ 5 \tag{1}\]

# crime data 

crime_data <- 
  crime|>
  rename_all(tolower)|>
  rename(county=stcofips)|>
  
  mutate(viol = murder+rape+robbery+agasslt)|>
  mutate(property = burglry+larceny+mvtheft)|>
  
  select(county, year, viol, property, cpopcrim) |>
  mutate(viol_rate = (viol / cpopcrim) * 100000) |>
  mutate(property_rate = (property / cpopcrim) * 100000) |>
  
  filter(!is.na(viol_rate) | !is.na(property_rate)) |>
  filter(!(is.infinite(viol_rate) | is.infinite(property_rate)))

attr(crime_data$viol, "label") <-
  "Total violent crimes reported (MURDER + RAPE + ROBBERY + AGASSLT)"
attr(crime_data$property, "label")<-
  "Total property crimes reported (BURGLRY + LARCENY + MVTHEFT)"
attr(crime_data$viol_rate, "label")<-
  "Violent crimes rate"
attr(crime_data$property_rate, "label")<-
  "Property crimes rate"

We then converted county units to states based on the five-digit FIPS code to more visually compare differences between regions.

# Extract state-level FIPS codes (first two digits of county FIPS codes)

crime_data$state_fips <- 
  substr(crime_data$county, 
         1, 2)

state_fips_to_abb <- 
  tibble(
  state_fips = c("01", "02", "04", "05", "06", 
                 "08", "09", "10", "11", "12", 
                 "13", "15", "16", "17", "18", 
                 "19", "20", "21", "22", "23", 
                 "24", "25", "26", "27", "28", 
                 "29", "30", "31", "32", "33", 
                 "34", "35", "36", "37", "38", 
                 "39", "40", "41", "42", "44", 
                 "45", "46", "47", "48", "49", 
                 "50", "51", "53", "54", "55", "56"),
  state_abb = c("AL", "AK", "AZ", "AR", "CA", "CO", 
                "CT", "DE", "DC", "FL", "GA", "HI", 
                "ID", "IL", "IN", "IA", "KS", "KY", 
                "LA", "ME", "MD", "MA", "MI", "MN", 
                "MS", "MO", "MT", "NE", "NV", "NH", 
                "NJ", "NM", "NY", "NC", "ND", "OH", 
                "OK", "OR", "PA", "RI", "SC", "SD", 
                "TN", "TX", "UT", "VT", "VA", "WA", 
                "WV", "WI", "WY")
)

Choropleth map

Finally, we chose the choropleth map to illustrate the average violent crime and property crime rate in the U.S., 2009 - 2014. To keep the visualization informative, we used plotly package to add hover text and so on for interactivity. There are also other interactive settings for readers, like zoom in and out for details.

# Convert county-level data to state-level data, aggregate violent crime data

crime_data_map <- 
  crime_data |>
  group_by(state_fips) |>
  summarise(
    avg_viol_rate = mean(viol_rate, na.rm = TRUE),
    avg_property_rate = mean(property_rate, na.rm = TRUE)
  ) |>
  
  left_join(state_fips_to_abb, 
            by = 'state_fips') |>
  
  mutate(
    hover_violence = paste(state_abb, 
                           '<br>', 
                           "Violent Crimes Rate: ", 
                           round(avg_viol_rate, 2)),
    hover_property = paste(state_abb, 
                           '<br>', 
                           "Property Crimes Rate: ", 
                           round(avg_property_rate, 2))  
  ) |>
  
  pivot_longer(
    cols = c(avg_viol_rate, 
             avg_property_rate),
    names_to = "crime_type",
    values_to = "crime_count"
  )
# Create and output the map for total violence

violence_map <- 
  plot_geo(crime_data_map |>
             filter(crime_type == "avg_viol_rate"),
           locationmode = 'USA-states') |>
  
  add_trace(
    z = ~crime_count, 
    text = ~hover_violence,  
    hoverinfo = 'text',     
    locations = ~state_abb,
    color = ~crime_count,
    zmin = 0, 
    zmax = 1000,
    colors = c("#1a9641", "#ffffbf", "#fdae61", "#d7191c"),
    colorbar = list(title = "Violent Crimes Rate",
                    tickvals = c(250, 500, 750))
  ) |>
  
  layout(
    title = 'US Crimes Rate by State, 2009 - 2014',
    geo = list(
      scope = 'usa',
      projection = list(type = 'albers usa'),
      showlakes = TRUE,
      lakecolor = toRGB('white')
    )
  )

# Create and output the map for total property

property_map <- 
  plot_geo(crime_data_map |>
             filter(crime_type == "avg_property_rate"),
           locationmode = 'USA-states') |>
  
  add_trace(
    z = ~crime_count, 
    text = ~hover_property,   
    hoverinfo = 'text',       
    locations = ~state_abb,
    color = ~crime_count,
    zmin = 0, 
    zmax = 5000,
    colors = c("#2b83ba", "#ffffbf", "#fdae61", "#d7191c"),
    colorbar = list(title = "Property Crimes Rate",
                    tickvals = c(1000, 2000, 3000, 4000))
  ) |>
  
  layout(
    title = 'US Crimes Rate by State, 2009 - 2014',
    geo = list(
      scope = 'usa',
      projection = list(type = 'albers usa'),
      showlakes = TRUE,
      lakecolor = toRGB('white')
    )
  )

# Place the two maps side by side in a stacked layout

fig <- subplot(violence_map, 
               property_map, 
               nrows = 2, 
               margin = 0.05) 

# Print the figure

fig

Areas with high crime rates are indicated by a darker red color. It can be seen that crime rate is higher in the south and near the border. There are also some interesting findings, such as the fact that Texas, which is usually considered to have a high rate of gun ownership, boasts a violent crime rate of less than 300, which could be influenced by population.